A Probabilistic Geocoding System based on a National Address File
نویسندگان
چکیده
It is estimated that between 80% and 90% of governmental and business data collections contain address information. Geocoding – the process of assigning geographic coordinates to addresses – is becoming increasingly important in many application areas that involve the analysis and mining of such data. In many cases, address records are captured and/or stored in a free-form or inconsistent manner. This fact complicates the task of robustly matching such addresses to spatiallyannotated reference data. In this paper we describe a geocoding system that is based on a comprehensive high-quality geocoded national address database. It uses a learning address parser based on hidden Markov models to separate free-form addresses into components, and a rule-based matching engine to determine the best set of candidate matches to a reference file. The geocoding software modules are implemented (as part of the Febrl open source data linkage system) in the object-oriented language Python, which allows rapid prototype development and testing.
منابع مشابه
Comparing a single-stage geocoding method to a multi-stage geocoding method: how much and where do they disagree?
BACKGROUND Geocoding methods vary among spatial epidemiology studies. Errors in the geocoding process and differential match rates may reduce study validity. We compared two geocoding methods using 8,157 Washington State addresses. The multi-stage geocoding method implemented by the state health department used a sequence of local and national reference files. The single-stage method used a sin...
متن کاملAssessing quality improvement initiatives when expert judgements are uncertain
A new approach for examining quality improvement initiatives regarding errors in the U.S. Census Bureau’s Master Address File (MAF) and the Topologically Integrated Geographic and Referencing System (TIGER) databases is presented. A stochastic multi-criteria decision-making method involving Bayesian weighted hierarchical multinomial logit models is used to conduct inference on the priorities in...
متن کاملA comparison of address point, parcel and street geocoding techniques
The widespread availability of powerful geocoding tools in commercial GIS software and the interest in spatial analysis at the individual level have made address geocoding a widely employed technique in many different fields. The most commonly used approach to geocoding employs a street network data model, in which addresses are placed along a street segment based on a linear interpolation of t...
متن کاملUsing an Optimized Chinese Address Matching Method to Develop a Geocoding Service: A Case Study of Shenzhen, China
With the coming era of big data and the rapid development and widespread applications of Geographical Information Systems (GISs), geocoding technology is playing an increasingly important role in bridging the gap between non-spatial data resources and spatial data in various fields. However, Chinese geocoding faces great challenges because of the complexity of the address string format in Chine...
متن کاملAccuracy of two geocoding methods for geographic information system-based exposure assessment in epidemiological studies
BACKGROUND Environmental exposure assessment based on Geographic Information Systems (GIS) and study participants' residential proximity to environmental exposure sources relies on the positional accuracy of subjects' residences to avoid misclassification bias. Our study compared the positional accuracy of two automatic geocoding methods to a manual reference method. METHODS We geocoded 4,247...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004